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Abstract.  Prior  knowledge  about  shape  may  be  quite  important  for 
image  segmentation.  In  particular,  a  number  of  different  methods  have 
been  proposed  to  compute  the  statistics  on  a  set  of  training  shapes, 
which  are  then  used  for  a  given  image  segmentation  task  to  provide  the 
shape  prior.  In  this  work,  we  perform  a  comparative  analysis  of  shape 
learning  techniques  such  as  linear  PGA,  kernel  PGA,  locally  linear  em¬ 
bedding  and  propose  a  new  method,  kernelized  locally  linear  embedding 
for  doing  shape  analysis.  The  surfaces  are  represented  as  the  zero  level 
set  of  a  signed  distance  function  and  shape  learning  is  performed  on  the 
embeddings  of  these  shapes.  We  carry  out  some  experiments  to  see  how 
well  each  of  these  methods  can  represent  a  shape,  given  the  training  set. 


1  Introduction 

Image  Segmentation  has  been  a  topic  of  extensive  research  in  the  computer  vision 
community  [1-4]. One  of  the  challenges  in  the  field  of  image  segmentation  is  the 
incorporation  of  prior  shape  knowledge  in  the  segmentation  process  [5].  Many 
different  methods  (using  both  parameterized  or  implicit  representation  of  shapes) 
have  been  proposed  [6-10]  to  perform  statistical  shape  analysis  on  a  given  set 
of  training  shapes.  In  this  work,  we  perform  a  comparative  analysis  of  several 
key  techniques  such  as  linear  PC  A  (LPCA),  kernel  PC  A  (KPCA),  locally  linear 
embedding  (LLE),  and  then  propose  a  new  method,  kernelized  locally  linear 
embedding  (KLLE)  which  will  be  compared  with  the  aforementioned  techniques. 

There  is  a  large  body  of  literature  available  for  representing  a  curve  or  surface 
using  parameterized  as  well  as  implicit  methods;  see  [3, 11, 12]  and  the  references 
therein.  A  number  of  methods  have  been  proposed,  using  these  representations, 
to  study  the  statistical  variations  in  a  given  set  of  training  shapes.  Cootes  et  al. 
[6]  developed  a  parametric  point  distribution  model  for  describing  the  segment¬ 
ing  curve  by  using  linear  combinations  of  eigenvectors  that  reflect  variations  from 
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the  mean  shape.  In  [13],  Wang  and  Staib  developed  a  statistical  point  model  by 
applying  linear  PC  A  to  the  covariance  matrices  that  capture  the  statistical  vari¬ 
ations  of  the  landmark  points.  Recently,  Leventon  [9]  proposed  a  more  general 
model  wherein  PC  A  was  performed  on  a  set  of  signed  distance  functions.  Kernel 
PC  A  has  been  successfully  used  by  the  machine  learning  community  for  pattern 
recognition  and  image  denoising  [14].  A  Gaussian  kernel  was  used  by  Cremers  et 
al  [8]  for  learning  shape  statistics  in  the  kernel  space  to  provide  shape  prior  for 
segmentation  tasks.  Finding  the  pre-image  of  the  projection  in  the  kernel  space 
is  one  of  the  challenging  tasks  in  visualizing  and  computing  the  performance  of 
these  kernel  based  techniques.  In  this  work,  we  use  the  method  proposed  by  [15] 
to  find  the  pre-image  of  the  projection  in  the  KPCA  space  and  compare  it  with 
LPCA. 

Locally  linear  embedding  has  been  widely  used  for  dimensionality  reduction 
and  extracting  out  the  nonlinearities  in  the  training  data  set.  In  this  work,  we 
use  LLE  to  represent  a  given  shape  using  a  linear  combination  of  its  nearest 
neighbors.  We  further  develop  this  algorithm  and  propose  a  new  method  to  per¬ 
form  LLE  in  the  kernel  space,  called  KLLE,  and  show  that  it  is  better  than  LLE, 
LPCA  and  is  comparable  to  KPCA  in  terms  of  performance  but  with  fewer  com¬ 
putations.  Of  course,  the  literature  reviewed  above  is  by  no  means  exhaustive. 
We  merely  want  to  point  out  a  new  technique  that  has  some  attractive  features 
which  may  act  as  an  alternative  to  some  already  existing  methodologies. 

The  rest  of  the  paper  is  organized  as  follows:  In  the  next  section  we  briefly 
describe  LPCA,  KPCA,  LLE  and  provide  details  about  KLLE.  In  section  3,  we 
show  with  examples  how  well  each  of  these  methods  perform  on  a  given  data 
set.  Section  4  gives  conclusion  and  future  research  directions. 

2  Statistical  Models 

This  section  briefly  describes  each  of  the  shape  learning  techniques  used  later  in 
the  sequel.  Let  r  be  the  training  set  r  =  {0i,  02,  •  •  • ,  0n}  consisting  of  n  signed 
distance  functions  (SDE)  with  the  shapes  represented  by  the  corresponding  zero 
level  sets.  It  is  assumed  that  all  the  0^’s  are  aligned  using  a  suitable  method  of 
registration  [6]. 


2.1  Linear  PCA 

Linear  PCA  is  widely  used  to  learn  the  statistical  variations  of  a  given  set  of 
data  (shapes,  in  our  case).  LPCA  assumes  that  the  set  of  permissible  shapes 
form  a  Gaussian  distribution,  i.e.,  all  possible  shapes  can  be  written  as  a  lin¬ 
ear  combination  of  a  set  of  eigenshapes  obtained  by  doing  principal  component 
analysis  on  the  training  data  set  [10,  9].  The  eigenshapes  can  be  obtained  as  fol¬ 
lows:  Let  represent  the  signed  distance  function  corresponding  to  the  surface 
Si.  The  mean  surface,  /r,  is  computed  by  taking  the  mean  of  the  signed  distance 
functions,  fi  =  ^  The  variance  in  shape  is  computed  using  PGA,  i.e.,  the 

mean  shape  /r  is  subtracted  from  each  (j)i  to  create  a  mean-offset  map  0^.  Each 


such  map,  0^,  is  placed  as  a  column  vector  in  an  x  n-dimensional  matrix  M, 
where  (f)i  G  .  Using  Singular  Value  Decomposition  (SVD),  the  covariance 
matrix  is  decomposed  as: 

UEU'^  =  -MM'^  (1) 

n 

where  U  is  a  matrix  whose  column  vectors  represent  the  set  of  orthogonal  modes 
of  shape  variation  (eigenshapes)  and  V  is  a  diagonal  matrix  of  corresponding 
eigenvalues.  An  estimate  of  a  novel  shape  ^  of  the  same  class  of  object  can 
be  obtained  from  m  principal  components  using  an  m-dimensional  vector  of 
coefficients, 

a  =  (2) 

where  Um  is  a  matrix  consisting  of  the  first  m  columns  of  U.  Given  the  coefficients 
a,  an  estimate  of  the  shape  namely  can  be  obtained  as  [9, 10]: 

^  =  UmOf  ffi  fi.  (3) 


2.2  Kernel  PCA 

Kernel  methods,  in  particular,  kernel  PCA  has  been  the  focus  of  research  in  the 
pattern  recognition  community [16, 17].  The  basic  idea  behind  these  methods  is 
to  map  the  data  in  the  input  space  0  G  y  to  a  feature  space  F  via  some  nonlinear 
map  and  then  apply  a  linear  method  in  F  to  do  further  analysis.  Kernel  PCA 
[14]  is  a  nonlinear  feature  extractor,  where  PCA  is  performed  in  the  feature  space 
F  which  is  equivalent  to  doing  nonlinear  PCA  in  the  input  space  y.  Since  the 
nonlinear  map  F  is  not  known,  a  challenging  problem  is  to  find  the  pre-image  of 
the  projection  obtained  by  doing  PCA  in  the  feature  space  F.  As  demonstrated 
by  Mika  [16],  the  exact  pre-image  typically  may  not  exist  and  one  can  only 
settle  for  an  approximate  solution.  But  even  this  may  be  non-trivial  as  the 
dimensionality  of  the  feature  space  can  be  infinite.  For  certain  invertible  kernels, 
this  nonlinear  problem  can  be  solved  using  a  fixed-point  iteration  method  as 
proposed  by  Scholkopf  and  Mika  [14, 16].  However,  this  method  is  dependent  on 
the  initial  starting  point  and  is  highly  susceptible  to  local  minima.  To  circumvent 
this  problem,  [17]  and  more  recently  [15]  proposed  an  algorithm  to  reconstruct 
an  approximate  pre-image  of  the  projection  as  described  briefiy  in  the  remainder 
of  this  section. 

Kernel  PCA  performs  the  traditional  linear  PCA  in  the  feature  space  cor¬ 
responding  to  the  kernel  A: (.,.)•  The  kernel  defines  the  inner  product  between 
two  points  in  the  feature  space,  i.e.,  /c(0i,02)  =<  ^(0i)5^(02)  >•  This  fact 
can  be  used  to  obtain  the  eigenvectors  in  the  feature  space  F  even  though  the 
non-linear  map  F  is  unknown.  Analogous  to  linear  PCA,  it  involves  the  following 
eigen-decomposition 

HKH  =  UEU'^, 

where,  K  is  the  kernel  matrix  with  entries  Kij  =  k{(pi,(pj),  H  is  the  centering 
matrix  given  by 

1 

iJ  =  J  -  -lU, 

n 


I  is  the  n  X  n  identity  matrix,  1  =  [11...!]^  is  an  n  x  1  vector,  U  =  [ai, 
with  a^  =  [a^i, ...,  is  the  matrix  containing  the  eigenvectors  and  U  = 

diag{\i^  •••,  A^)  contains  the  corresponding  eigenvalues.  Denote  the  mean  of  the 
iZ^-mapped  data  by  #  =  A  arid  define  the  “centered”  map  ^  as  : 

^{(j))  =  -  #. 

The  k-th.  orthonormal  eigenvector  of  the  covariance  matrix  in  the  feature  space 
can  then  be  shown  to  be  [14] 


=  E 

i=i 


Denote  the  projection  of  the  iZ^-image  of  a  test  point  ^  onto  the  k-th  component 
by  /3fc.  Then, 


Pk 


(4) 


where. 


k{x,  y)  =<  ^{x),^{y)  >=  k{x,  y)  -  ky  +  K1 

n  n  (5) 

with  kx  =  [k{x,  k{x,  (f>n)f' 

The  projection  of  ^{<P)  onto  the  subspace  spanned  by  the  first  m  eigenvectors 
is  given  by  : 

m 

P^{'^)  =  Y,l3kVk+^ 

/c=l 

To  obtain  an  approximate  pre-image  of  P^{<P)  in  the  input  space,  we  mini¬ 
mize  the  error  p(^)  =||  —  PP{^)  |p.  Following  the  exposition  in  [15],  for  a 

Gaussian  kernel  (also  known  as  radial  basis  function)  given  by  : 

k{(f>i,(f>j)  =  e  2<,2  (6) 

where  d?'{4>i,4>j)  is  a  distance  measure  in  the  input  space,  one  can  obtain  an 
approximate  pre-image  by  setting  =  0  and  using  the  approximation  P{^)  ^ 
PP(^).  Here,  we  directly  state  the  result  for  finding  the  pre-image  d>  (in  the  input 
space  x)  of  fho  projection  PP(^)  [15]: 

^  Er=i  7^  {w  - 

E”=i7i(i(2-d2W^),S^(0O)) 


(7) 


where  7^  =  l^kCiki  and  7^  =  7^  +  ^(1  —  7i)  and  cP  can  be  computed 

only  in  terms  of  the  kernel  using  the  following  expression  [15, 17]: 


H — K1  —  2k(i 
n 


H^  MH  (  fc<8  -  -K1 


n 


(8) 


where  M  =  YZ=i  7^^=  ^  </'*)• 

In  this  work,  we  have  used  the  following  shape  similarity  measure  given  by 
[18]: 

(f{(pi,(pj)=  f  EDT^.{p)dp+  j  EDT^^{j))dp,  (9) 

J  pEZ{4)i)  J  pEZ{(f)j) 

where  EDT^j^^  is  the  Euclidean  distance  function  of  the  zero  level  set  of  (one 
can  think  of  it  as  the  absolute  value  of  0^),  and  Z{(pi)  is  the  zero  level  set  of  0^. 
This  distance  measure  allows  for  partial  shape  matching  and  was  shown  [15]  to 
perform  better  (empirically)  than  the  Euclidean  L2  norm.  Note  that,  ^  is  only 
an  approximate  pre-image  of  the  projection,  since  an  exact  pre-image  may  not 
exist. 

If  we  use  the  kernel  k{(j)i^  (j)j)  =<  0^,  >,  then  KPCA  is  equivalent  to  doing 

LPCA.  Thus,  linear  PC  A  is  a  particular  case  of  kernel  PC  A.  Choosing  the  right 
kernel  for  a  given  data  set  is  a  topic  of  active  research.  In  this  work  we  have 
used  the  Gaussian  kernel  (6),  which  is  the  most  commonly  used  kernel  in  the 
machine  learning  community. 


2.3  Locally  linear  embedding 

The  LLE  algorithm  [19]  is  based  on  certain  simple  geometric  principles.  Suppose 
the  data  consists  of  n  vectors  (j)i  sampled  from  some  smooth  underlying  manifold. 
Provided  there  is  sufficient  data,  we  expect  each  data  point  and  its  neighbors 
to  he  on  or  close  to  a  locally  linear  patch  of  the  manifold.  We  can  characterize 
the  local  geometry  of  these  patches  by  a  set  of  coefficients  that  reconstruct  each 
data  point  from  its  neighbors.  In  the  simplest  formulation  of  LLE,  one  identifies  k 
nearest  neighbors  for  a  data  point.  Reconstruction  error  is  then  measured  by  the 

cost  function:  E{W)  =  Wj(j)j^  .  We  seek  to  minimize  the  reconstruction 

error  E{W),  subject  to  the  constraint  that  the  weights  Wj  that  he  outside  the 
neighborhood  are  zero  and  =  1-  With  these  constraints,  the  weights  for 

points  in  the  neighborhood  of  ^  can  be  obtained  as  [20] : 

E{W)= 


—  WjWmQj 


Wn  = 


jm 


j=l  m=l 


\^k  s^k  p  ’ 
Z^»=l  Z^q=l 


where  =  (^  —  —  0^)  and  R  =  Q 


(10) 


In  applications  where  dimensionality  reduction  is  the  major  objective,  one  pro¬ 
ceeds  further  and  computes  a  low  dimensional  vector  corresponding  to  each  0^, 
preserving  the  neighborhood  structure  by  keeping  the  weights  Wj  constant  [20]. 
This  work  uses  LLE  only  for  obtaining  the  neighborhood  structure  in  the  train¬ 
ing  set  and  not  for  dimensionality  reduction.  Thus,  we  assume  that  a  closed 
surface  S  can  be  represented  by  a  linear  combination  of  its  k  nearest  neighbors. 
Stacking  all  the  columns  of  (pi  one  below  the  other,  one  can  obtain  a  vector  of 
dimension  ^  if  pi  is  of  dimension  D  x  D.  Thus,  given  a  test  point  one  can 
obtain  the  weights  using  equation  (10)  that  minimize  the  reconstruction  error 
E(W).  The  nearest  neighbors  are  obtained  from  the  training  set  by  finding  the 
squared  distance  cP  (equation  9)  between  ^  and  each  of  the  shapes  pi  in  the 
training  set. 


2.4  Kernel  LLE 


Mercer  kernels  have  been  used  quite  successfully  for  learning  in  Support  Vector 
Machines  (SVM)  and  in  KPCA  as  mentioned  before.  The  above  LLE  algorithm 
can  be  generalized  for  nonlinear  manifolds  by  employing  the  kernel  trick  [14].  In 
[21],  the  author  compares  the  discriminative  power  of  LLE,  KLLE  and  LPCA  by 
projecting  the  training  data  to  a  lower  dimensional  space  and  thereby  comparing 
the  recognition  rate  of  a  given  test  sample.  The  methods  presented  in  this  work 
are  quite  different  than  those  proposed  in  [21],  since  we  do  not  compute  a  low 
dimensional  data  for  LLE  or  KLLE,  but  compare  their  performances  in  the 
input  space  itself.  This  is  quite  essential  for  shape  analysis  in  which  one  needs  to 
compute  how  accurately  a  given  data  point  can  be  reproduced  in  the  input  space 
using  these  techniques.  Thus,  the  method  proposed  in  [21]  uses  LLE,  KLLE  only 
for  classification  purposes,  while  we  utilize  it  to  see  its  performance  in  the  input 
space.  A  major  contribution  of  this  work  is  the  formulation  of  a  method  to  find 
the  pre-image  of  the  projection  in  the  kernel  space,  given  the  fact  that  we  do 
not  know  the  mapping  E. 

The  basic  idea  behind  KLLE  is  to  minimize  the  error  (given  a  test  point 
E{W)  =  p{<P)  -  Ej  WjE{pj)^  .  Proceeding  as  shown  in  LLE  before,  we  get 
the  following  expression  for  the  weights: 


Wi 


jm 


p=i  Z^o=i  -^pq 


where. 


Qjm  =  in^)  -  =  m  4>m) 

—  k{^,pj)  E  k{pj,pm)  and  R  =  Q~^. 

The  weights  Wj  so  obtained  minimize  the  error  E{W)  in  the  feature  space 
F,  i.e.,  F(^)  =  'kOj^iPj)  +  \PE  =  E{S)  -h  VE.  Assuming  E  to  be  small, 

we  have  F(^)  ^  E(p).  Our  goal  now  is  to  find  the  pre-image  of  F(^).  However, 
an  exact  pre-image  of  F(^)  may  not  exist  [16],  hence  we  find  an  approximate 
pre-image  of  F(^)  in  the  input  space  y.  Thus,  we  want  to  find  the  point  ^{z) 


which  is  closest  to  ^{<P)  and  for  which  the  pre-image  can  be  computed.  This  can 
be  achieved  by  minimizing  the  following: 


p{z)  =  II  <P{z)  -  f  Si  k{z,z)  -2Y^Wjk{z,(pj)  +  k{^,^), 

3 


where  we  have  substituted  the  approximation  for  Setting  V zp{z)  =  0  and 

using  the  kernel  k{z,<P)  =  exp(— one  gets  the  following  expression  for 


finding  z: 


Ej=i  Wjk{z,  (j)j)  (pj 

T!i=iWjk{z,(l)j) 


(12) 


This  equation  contains  z  on  both  sides  of  the  equation  and  hence  can  be  solved  by 
fixed-point  iteration  technique.  However,  the  solution  will  depend  on  the  starting 
point  and  will  be  very  susceptible  to  local  minima.  A  unique  (but  approximate) 
solution  to  2)  can  be  found  by  noting  that 


where  we  assume  ^  ^{z).  Note  that  this  assumption  is  valid  since  we  are 
trying  to  find  the  point  ^{z)  that  is  closest  to  The  error  in  the  computed 

pre-image  will  be  proportional  to  the  error  in  approximating  ^{z)  =  which 

in  general  can  be  assumed  to  be  small.  As  shown  in  [15],  better  results  can  be 
obtained  if  the  distance  measure  (9)  (for  cP)  is  used  in  the  Gaussian  kernel 
instead  of  the  Euclidean  L2  norm  and  hence  we  use  it  in  all  our  experiments  as 
described  in  the  next  section. 

A  pre-image  can  be  computed  not  only  for  a  Gaussian  kernel,  but  for  any 
invertible  kernel.  If  we  assume  a  polynomial  kernel  /c(0i,0j)  =  (c-h0f0j)^, 
where  d  is  the  degree  of  the  polynomial  and  c  is  any  constant,  then  the  pre¬ 
image  z  of  Si  point  is  given  by 


(13) 


Thus,  LLE  is  a  particular  case  of  KLLE  with  a  polynomial  kernel  of  degree 
d  =  1  and  c  =  0.  Once  again,  the  k  nearest  neighbors  can  be  computed  using 
the  distance  relation  (9)  or  any  other  metric  on  the  space  of  shapes  [22,11,7, 
23-25]. 


3  Experiments 

In  this  section,  we  describe  two  experiments  to  test  how  well  each  method  per¬ 
forms  given  a  training  set  of  shapes.  The  first  set  of  3D  shapes  consists  of  the 
left  caudate  nucleus  and  the  second  set  consists  of  the  left  hippocampus.  These 
are  structures  in  the  brain  for  which  a  shape  prior  is  often  used  in  segmentation 
algorithms.  A  typical  measure  to  test  the  performance  of  these  methods  is  to 


see  how  well  an  unknown  shape  gets  projected  by  each  of  these  methods.  In  this 
work,  a  quantitative  measure  was  calculated  by  finding  the  number  of  voxels 
that  got  mislabelled,  i.e.,  by  finding  the  set  symmetric  difference  between  the 
projection  and  the  original  test  shape. 

The  training  set  for  the  caudate  nuclei  consisted  of  26  elements,  each  of 
them  embedded  in  a  signed  distance  function.  Figure  1  shows  a  few  shapes  in 
the  training  set.  In  Figure  2,  an  “unseen  shape”  (i.e.,  a  shape  not  in  the  training 
set)  is  shown  and  also  the  pre-image  of  the  projection  using  each  of  the  meth¬ 
ods.  Table  1  shows  the  number  of  mislabeled  voxels  for  each  of  the  methods. 
For  LPCA  and  KPCA,  20  coefficients  were  used  in  finding  the  projection  while 
for  LLE  and  KLLE  20  nearest  neighbors  were  used  so  that  we  do  not  obtain 
biased  results  in  favor  of  a  particular  method.  Clearly,  the  kernel  methods  per¬ 
form  better  than  their  linear  counterparts.  More  specifically,  KLLE  performs 
almost  as  well  or  better  than  KPCA,  but  with  a  smaller  computational  burden. 

Table  1.  Mislabelled  voxels  for  left  caudate  nucleus 


Volume 

Volume  Size 

LPCA 

LLE 

KPCA 

KLLE 

1 

2750 

119 

50 

37 

42 

2 

3774 

134 

105 

92 

81 

3 

2489 

108 

66 

57 

52 

Fig.  1:  Sample  shapes  of  left  Caudate  nucleus  from  the  training  set. 

The  second  training  set  of  the  hippocampii  data  contained  22  elements.  Fig¬ 
ure  3  shows  a  few  shapes  from  the  training  set  and  figure  4  shows  the  original 
and  pre- images  of  projection  for  each  of  the  methods.  For  this  experiment,  we 
used  15  coefficients  for  LPCA  and  KPCA  and  15  nearest  neighbors  for  LLE  and 
KLLE.  Table  2  gives  the  number  of  mislabelled  voxels  for  each  of  the  methods. 
Eigure  5  shows  the  weights  assigned  to  each  of  the  neighbors  (for  all  the  three  test 
shapes)  using  LLE  and  KLLE.  Clearly,  KLLE  assigns  larger  weights  to  points 
closer  to  the  test  shape  than  to  points  farther  away.  Thus,  only  points  in  the 
locally  linear  patch  of  the  feature  space  are  assigned  significant  weights,  whereas 
other  points  are  assigned  weights  close  to  zero.  This  nonlinear  distribution  is 
expected  since  we  used  a  Gaussian  kernel.  Once  again,  it  is  clear  that  KLLE 
performs  better  than  all  the  other  methods.  It  should  be  noted  that,  LLE  and 
KLLE  can  perform  even  better  with  the  proper  choice  of  the  number  of  nearest 
neighbors  as  given  in  [20].  To  make  a  fair  assessment  of  each  method,  we  kept  k 
(nearest  neighbors)  fixed  and  did  not  optimize  the  algorithm  as  given  in  [20] . 


Table  2.  Mislabelled  voxels  for  left  hippocampus 


Volume 

Volume  Size 

LPCA 

LLE 

KPCA 

KLLE 

1 

1117 

440 

378 

322 

296 

2 

1108 

306 

258 

212 

205 

3 

1568 

804 

574 

494 

371 

(a)  Original 


(b)  LPGA 


(e)  KLLE 

Fig.  2:  Projection  of  left  Caudate  nucleus  (Volume  1)  using  each  of  the  methods. 


Fig.  3:  Sample  shapes  of  left  hippocampus  from  the  training  set. 

In  all  of  the  experiments  above,  the  parameter  a  used  in  the  Gaussian  kernel 
was  fixed  to  be  some  function  of  the  average  minimum  distance  between  shapes 
in  the  training  set  [8],  i.e.,  =  c  ^  minj^^d^(0^,  0^).,  where  c  is  a  user 

defined  real  number.  The  training  data  (hand  segmented  shapes)  was  obtained 
from  the  NAMIC  data  repository  of  the  Brigham  and  Women’s  Hospital,  Boston, 
MA.  The  entire  code  was  written  in  C++  using  the  ITK  and  VTK  libraries. 

4  Remarks 

In  this  paper,  we  have  proposed  a  new  algorithm  for  finding  an  approximate 
pre-image  of  a  point  in  the  kernel  space  in  the  context  of  Kernel  LLE  which 
is  a  generalization  of  LLE  to  the  kernel  space.  We  have  compared  this  method 
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(a)  Original 


(b)  LPGA 


(c)  LLE 


(d)  KPCA  (e)  KLLE 

Fig.  4:  Projection  of  left  Hippocampus  (Volume  3)  using  each  of  the  methods. 


Fig.  5:  Weights  assigned  to  the  15  nearest  neighbors  by  LLE  and  KLLE  for  each 
of  the  test  shapes  of  hippocampus.  On  the  x-axis,  1  is  the  closest  neighbor,  15  is 
the  farthest. 


with  other  methods  such  as  linear  PC  A,  kernel  PC  A  and  LLE  in  terms  of  its 
capability  to  represent  unseen  shapes.  Experiments  show  that  it  performs  better 
than  LPCA  and  LLE  and  is  comparable  to  KPCA,  but  with  considerably  fewer 
computations.  We  certainly  do  not  claim  that  KLLE  is  the  best  method  to  use 
for  any  given  training  set  of  shapes,  but  it  did  give  good  results  on  the  training 
data  on  which  it  was  tested. 

Nevertheless,  representing  a  shape  using  its  nearest  neighbors  requires  that 
the  training  set  contain  sufficient  data  points.  LPCA  and  KPCA  have  an  in¬ 
nate  capability  to  “produce”  shapes  by  varying  the  PC  A  coefficients.  This  is  not 
the  case  with  LLE  or  KLLE.  On  the  other  hand,  if  sufficient  amount  of  data 
is  available,  LLE  and  KLLE  can  perform  better  than  PCA  based  algorithms. 
Another  advantage  of  LLE  and  KLLE  is  that  they  allow  one  to  learn  shapes 
of  completely  different  geometries,  within  the  same  training  set.  The  reason  for 
this  is  that  these  methods  use  only  their  nearest  neighbors  to  find  the  projection 
instead  of  using  the  entire  training  set  which  is  the  case  with  KPCA  and  LPCA. 
One  of  the  reasons  why  the  kernel  methods  work  better  than  their  linear  coun¬ 
terparts  is  that,  the  set  of  signed  distance  functions  (SDE)  is  not  closed  under 
addition.  Thus,  the  variations  captured  by  linear  methods  are  the  variations  in 
the  SDE’s  and  not  in  the  embedded  shapes,  whereas  the  kernel  methods  capture 
the  variations  in  shapes  and  not  the  embeddings.  We  should  also  note  that,  the 
performance  of  all  these  methods  will  get  better  if  one  has  a  large  training  set 
(with  shapes  of  the  same  object). 

In  this  work,  we  have  used  signed  distance  function  to  represent  shapes. 
However,  the  algorithms  used  here  do  not  depend  on  any  particular  type  of 
representation.  Performing  a  detailed  comparative  analysis  using  all  of  these 
methods  with  different  representations  (parametric  and  implicit)  for  shapes  is 
the  subject  of  future  research.  We  would  also  like  to  test  these  methods  on  a 
wide  variety  of  shapes  with  varying  sizes  of  the  training  data  set. 
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